From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning

نویسندگان

  • Lieke Gelderloos
  • Grzegorz Chrupala
چکیده

We present a model of visually-grounded language learning based on stacked gated recurrent neural networks which learns to predict visual features given an image description in the form of a sequence of phonemes. The learning task resembles that faced by human language learners who need to discover both structure and meaning from noisy and ambiguous data across modalities. We show that our model indeed learns to predict features of the visual context given phonetically transcribed image descriptions, and show that it represents linguistic information in a hierarchy of levels: lower layers in the stack are comparatively more sensitive to form, whereas higher layers are more sensitive to meaning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A study of viewpoints of English language instructors to motivate Lerner to learning English through curricular; representation of a Model

One of the problems of students’ entrance from secondary education to  university is lack of English language skills  and incentive to improve their learning. This research aims to identify the ways to strengthen English language skills with an emphasis on undergraduate students' motivation. This research is qualitative approach and Grounded theory strategy. The study population has been consis...

متن کامل

Encoding of phonology in a recurrent neural model of grounded speech

We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded in the MFCC features extracted from the speech s...

متن کامل

Representations of language in a model of visually grounded speech signal

We present a visually grounded model of speech perception which projects spoken utterances and images to a joint semantic space. We use a multi-layer recurrent highway network to model the temporal nature of spoken speech, and show that it learns to extract both form and meaningbased linguistic knowledge from the input signal. We carry out an in-depth analysis of the representations used by dif...

متن کامل

A Social Semiotic Analysis of Social Actors in English-Learning Software Applications

This study drew upon Kress and Van Leeuwen’s (2006, [1996]) visual grammar and Van Leeuwen’s (2008) social semiotic model to interrogate ways through which social actors of different races are visually and textually represented in four award-winning English-learning software packages.  The analysis was based on narrative actional/reactional processes at the ideational level; mood, perspective, ...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016